Reading and Summarizing a MAF file

Required Input files

Three different types of input files are used. They are - MAF file- which is gz compressed - Clinical data associated with each sample / Tumor_sample_barcode in MAF (optional) - Copy number data if its available

Reading MAF files

The read.maf function is utilized for reading and processing MAF files, subsequently storing the information as a MAF object. Additionally, it can accept input for the tumor sample barcode and copy number data. By default, variants with high or moderate consequences are regarded as non-synonymous, but this can be adjusted as required using the vc_nonSyn parameter in read.maf.

MAF object

A MAF object comprises the MAF file itself, along with summarized information and related sample annotations. To access valuable components within MAF objects, one can employ the accessor methods.

laml
## An object of class  MAF 
##                    ID          summary  Mean Median
##  1:        NCBI_Build               37    NA     NA
##  2:            Center genome.wustl.edu    NA     NA
##  3:           Samples              193    NA     NA
##  4:            nGenes             1241    NA     NA
##  5:   Frame_Shift_Del               52 0.269      0
##  6:   Frame_Shift_Ins               91 0.472      0
##  7:      In_Frame_Del               10 0.052      0
##  8:      In_Frame_Ins               42 0.218      0
##  9: Missense_Mutation             1342 6.953      7
## 10: Nonsense_Mutation              103 0.534      0
## 11:       Splice_Site               92 0.477      0
## 12:             total             1732 8.974      9
#Shows sample summary.
getSampleSummary(laml)
##      Tumor_Sample_Barcode Frame_Shift_Del Frame_Shift_Ins In_Frame_Del
##   1:         TCGA-AB-3009               0               5            0
##   2:         TCGA-AB-2807               1               0            1
##   3:         TCGA-AB-2959               0               0            0
##   4:         TCGA-AB-3002               0               0            0
##   5:         TCGA-AB-2849               0               1            0
##  ---                                                                  
## 189:         TCGA-AB-2942               0               0            0
## 190:         TCGA-AB-2946               0               0            0
## 191:         TCGA-AB-2954               0               0            0
## 192:         TCGA-AB-2982               0               0            0
## 193:         TCGA-AB-2903               0               0            0
##      In_Frame_Ins Missense_Mutation Nonsense_Mutation Splice_Site total
##   1:            1                25                 2           1    34
##   2:            0                16                 3           4    25
##   3:            0                22                 0           1    23
##   4:            0                15                 1           5    21
##   5:            0                16                 1           2    20
##  ---                                                                   
## 189:            1                 0                 0           0     1
## 190:            0                 1                 0           0     1
## 191:            0                 1                 0           0     1
## 192:            0                 1                 0           0     1
## 193:            0                 0                 0           0     0
#Shows gene summary.
getGeneSummary(laml)
##       Hugo_Symbol Frame_Shift_Del Frame_Shift_Ins In_Frame_Del In_Frame_Ins
##    1:        FLT3               0               0            1           33
##    2:      DNMT3A               4               0            0            0
##    3:        NPM1               0              33            0            0
##    4:        IDH2               0               0            0            0
##    5:        IDH1               0               0            0            0
##   ---                                                                      
## 1237:      ZNF689               0               0            0            0
## 1238:      ZNF75D               0               0            0            0
## 1239:      ZNF827               1               0            0            0
## 1240:       ZNF99               0               0            0            0
## 1241:        ZPBP               0               0            0            0
##       Missense_Mutation Nonsense_Mutation Splice_Site total MutatedSamples
##    1:                15                 0           3    52             52
##    2:                39                 5           6    54             48
##    3:                 1                 0           0    34             33
##    4:                20                 0           0    20             20
##    5:                18                 0           0    18             18
##   ---                                                                     
## 1237:                 1                 0           0     1              1
## 1238:                 1                 0           0     1              1
## 1239:                 0                 0           0     1              1
## 1240:                 1                 0           0     1              1
## 1241:                 1                 0           0     1              1
##       AlteredSamples
##    1:             52
##    2:             48
##    3:             33
##    4:             20
##    5:             18
##   ---               
## 1237:              1
## 1238:              1
## 1239:              1
## 1240:              1
## 1241:              1
#shows clinical data associated with samples
getClinicalData(laml)
##      Tumor_Sample_Barcode FAB_classification days_to_last_followup
##   1:         TCGA-AB-2802                 M4                   365
##   2:         TCGA-AB-2803                 M3                   792
##   3:         TCGA-AB-2804                 M3                  2557
##   4:         TCGA-AB-2805                 M0                   577
##   5:         TCGA-AB-2806                 M1                   945
##  ---                                                              
## 189:         TCGA-AB-3007                 M3                  1581
## 190:         TCGA-AB-3008                 M1                   822
## 191:         TCGA-AB-3009                 M4                   577
## 192:         TCGA-AB-3011                 M1                  1885
## 193:         TCGA-AB-3012                 M3                  1887
##      Overall_Survival_Status
##   1:                       1
##   2:                       1
##   3:                       0
##   4:                       1
##   5:                       1
##  ---                        
## 189:                       0
## 190:                       1
## 191:                       1
## 192:                       0
## 193:                       0
#Shows all fields in MAF
getFields(laml)
##  [1] "Hugo_Symbol"            "Entrez_Gene_Id"         "Center"                
##  [4] "NCBI_Build"             "Chromosome"             "Start_Position"        
##  [7] "End_Position"           "Strand"                 "Variant_Classification"
## [10] "Variant_Type"           "Reference_Allele"       "Tumor_Seq_Allele1"     
## [13] "Tumor_Seq_Allele2"      "Tumor_Sample_Barcode"   "Protein_Change"        
## [16] "i_TumorVAF_WU"          "i_transcript_name"
#Writes maf summary to an output file with basename laml.
write.mafSummary(maf = laml, basename = 'laml')

Visualization

Plotting MAF summary

The plotmafSummary function enables summarization of the MAF file, presenting the variant count per sample through a stacked barplot, and exhibiting variant types via box plots.

For a simplified barplot of mutated genes, we use the mafbarplot function.

plotmafSummary(maf = laml, rmOutlier = TRUE, addStat = 'median', dashboard = TRUE, titvRaw = FALSE)

Oncoplots and waterfall plots are better options for a better visualization.

NOTE: Variants annotated as Multi_Hit are those genes which are mutated more than once in the same sample.

Oncoplots

#oncoplot for top ten mutated genes.
oncoplot(maf = laml, top = 10)

Transition and Transversions

The titv function categorizes Transitions and Transversions in SNPs, providing a summary table in diverse formats. This data can be depicted as a layered barplot that shows the conversion percentages for each sample, as well as a boxplot that illustrates the distribution of six unique conversion types.

laml.titv = titv(maf = laml, plot = FALSE, useSyn = TRUE)
#plot titv summary
plotTiTv(res = laml.titv)

Lollipop plots for amino acid changes

To employ the lollipopPlot function, information about amino acid alterations in the MAF file is necessary. However, the field (or column) names representing these changes in MAF files differ across studies due to the absence of standardized guidelines. By default, lollipopPlot looks for the AAChange column in the MAF file and, if unavailable, issues a warning while displaying all accessible fields. In the example provided, the MAF file records amino acid modifications in a field or column titled “Protein_Change.” To specify this explicitly, we will use the AACol parameter.

#lollipop plot for DNMT3A, which is one of the most frequent mutated gene in Leukemia.
lollipopPlot(
  maf = laml,
  gene = 'DNMT3A',
  AACol = 'Protein_Change',
  showMutationRate = TRUE,
  labelPos = 882
)
## 3 transcripts available. Use arguments refSeqID or proteinID to manually specify tx name.
##      HGNC refseq.ID protein.ID aa.length
## 1: DNMT3A NM_022552  NP_072046       912
## 2: DNMT3A NM_153759  NP_715640       723
## 3: DNMT3A NM_175629  NP_783328       912
## Using longer transcript NM_022552 for now.
## Removed 3 mutations for which AA position was not available

plotProtein(gene = "TP53", refSeqID = "NM_000546")

Cancer genomes, especially solid tumors, frequently exhibit genomic regions with concentrated hyper-mutations5. Visualizing the distance between variants on a linear genomic scale can help identify these hyper-mutated areas. These visualizations are commonly called rainfall plots, which can be generated using the rainfallPlot function. By setting the detectChangePoints parameter to TRUE, the rainfall plot can also highlight potential regions of change in the distances between events

Rainfall Plots

brca <- system.file("extdata", "brca.maf.gz", package = "maftools")
brca = read.maf(maf = brca, verbose = FALSE)
rainfallPlot(maf = brca, detectChangePoints = TRUE, pointSize = 0.4)
## Processing TCGA-A8-A08B..
## Kataegis detected at:
##    Chromosome Start_Position End_Position nMuts Avg_intermutation_dist Size
## 1:          8       98129348     98133560     7               702.0000 4212
## 2:          8       98398549     98403536     9               623.3750 4987
## 3:          8       98453076     98456466     9               423.7500 3390
## 4:          8      124090377    124096810    22               306.3333 6433
## 5:         12       97436055     97439705     7               608.3333 3650
## 6:         17       29332072     29336153     8               583.0000 4081
##    Tumor_Sample_Barcode C>G C>T
## 1:         TCGA-A8-A08B   4   3
## 2:         TCGA-A8-A08B   1   8
## 3:         TCGA-A8-A08B   1   8
## 4:         TCGA-A8-A08B   1  21
## 5:         TCGA-A8-A08B   4   3
## 6:         TCGA-A8-A08B   4   4

Compare mutation load against TCGA cohorts

To contrast mutation burden with 33 TCGA cohorts, the tcgaCompare function utilizes mutation data from TCGA MC3. The resulting plot bears a resemblance to the one presented by Alexandrov et al.5.

laml.mutload = tcgaCompare(maf = laml, cohortName = 'Example-LAML', logscale = TRUE, capture_size = 50)
## Warning in FUN(X[[i]], ...): Removed 1 samples with zero mutations.
## Capture size [TCGA]:  35.8
## Capture size [Input]: 50
## Performing pairwise t-test for differences in mutation burden (per MB)..

Plotting VAF

For a rapid assessment of the clonal status of commonly mutated genes, this utility presents variant allele frequencies in the form of a boxplot. Clonal genes generally exhibit mean allele frequencies near 50%, assuming a pure sample.

plotVaf(maf = laml, vafCol = 'i_TumorVAF_WU')

Analysis

Somatic Interactions

The somaticInteractions function can be employed to detect sets of genes that are either mutually exclusive or co-occurring. This is achieved through a pair-wise Fisher’s exact test, which identifies significant relationships between these genes.

#exclusive/co-occurance event analysis on top 10 mutated genes. 
somaticInteractions(maf = laml, top = 25, pvalue = c(0.05, 0.1))

##      gene1  gene2       pValue oddsRatio  00 11 01 10        pAdj
##   1: ASXL1  RUNX1 0.0001541586 55.215541 176  4 12  1 0.003568486
##   2:  IDH2  RUNX1 0.0002809928  9.590877 164  7  9 13 0.006055880
##   3:  IDH2  ASXL1 0.0004030636 41.077327 172  4  1 16 0.008126283
##   4:  FLT3   NPM1 0.0009929836  3.763161 125 17 16 35 0.018664260
##   5:  SMC3 DNMT3A 0.0010451985 20.177713 144  6 42  1 0.018664260
##  ---                                                             
## 296: PLCE1  ASXL1 1.0000000000  0.000000 184  0  5  4 1.000000000
## 297: RAD21  FAM5C 1.0000000000  0.000000 183  0  5  5 1.000000000
## 298: PLCE1  FAM5C 1.0000000000  0.000000 184  0  5  4 1.000000000
## 299: PLCE1  RAD21 1.0000000000  0.000000 184  0  5  4 1.000000000
## 300:  EZH2  PLCE1 1.0000000000  0.000000 186  0  4  3 1.000000000
##                   Event         pair event_ratio
##   1:       Co_Occurence ASXL1, RUNX1        4/13
##   2:       Co_Occurence  IDH2, RUNX1        7/22
##   3:       Co_Occurence  ASXL1, IDH2        4/17
##   4:       Co_Occurence   FLT3, NPM1       17/51
##   5:       Co_Occurence DNMT3A, SMC3        6/43
##  ---                                            
## 296: Mutually_Exclusive ASXL1, PLCE1         0/9
## 297: Mutually_Exclusive FAM5C, RAD21        0/10
## 298: Mutually_Exclusive FAM5C, PLCE1         0/9
## 299: Mutually_Exclusive PLCE1, RAD21         0/9
## 300: Mutually_Exclusive  EZH2, PLCE1         0/7

Detecting cancer driver genes based on positional clustering

Maftools includes a feature called Oncodrive, which identifies cancer-causing genes (drivers) within a given MAF file. This feature is grounded in the oncodriveCLUST algorithm’s Python implementation. The underlying concept stems from the observation that most gene mutations related to cancer tend to concentrate at a limited number of specific sites, also known as hotspots. Oncodrive leverages these hotspots to detect cancer-related genes.

laml.sig = oncodrive(maf = laml, AACol = 'Protein_Change', minMut = 5, pvalMethod = 'zscore')
## Warning in oncodrive(maf = laml, AACol = "Protein_Change", minMut = 5,
## pvalMethod = "zscore"): Oncodrive has been superseeded by OncodriveCLUSTL. See
## http://bg.upf.edu/group/projects/oncodrive-clust.php
## Estimating background scores from synonymous variants..
## Not enough genes to build background. Using predefined values. (Mean = 0.279; SD = 0.13)
## Estimating cluster scores from non-syn variants..
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |======================================================================| 100%
## Comapring with background model and estimating p-values..
## Done !
head(laml.sig)
##    Hugo_Symbol Frame_Shift_Del Frame_Shift_Ins In_Frame_Del In_Frame_Ins
## 1:        IDH1               0               0            0            0
## 2:        IDH2               0               0            0            0
## 3:        NPM1               0              33            0            0
## 4:        NRAS               0               0            0            0
## 5:       U2AF1               0               0            0            0
## 6:         KIT               1               1            0            1
##    Missense_Mutation Nonsense_Mutation Splice_Site total MutatedSamples
## 1:                18                 0           0    18             18
## 2:                20                 0           0    20             20
## 3:                 1                 0           0    34             33
## 4:                15                 0           0    15             15
## 5:                 8                 0           0     8              8
## 6:                 7                 0           0    10              8
##    AlteredSamples clusters muts_in_clusters clusterScores protLen   zscore
## 1:             18        1               18     1.0000000     414 5.546154
## 2:             20        2               20     1.0000000     452 5.546154
## 3:             33        2               32     0.9411765     294 5.093665
## 4:             15        2               15     0.9218951     189 4.945347
## 5:              8        1                7     0.8750000     240 4.584615
## 6:              8        2                9     0.8500000     976 4.392308
##            pval          fdr fract_muts_in_clusters
## 1: 1.460110e-08 1.022077e-07              1.0000000
## 2: 1.460110e-08 1.022077e-07              1.0000000
## 3: 1.756034e-07 8.194826e-07              0.9411765
## 4: 3.800413e-07 1.330144e-06              1.0000000
## 5: 2.274114e-06 6.367520e-06              0.8750000
## 6: 5.607691e-06 1.308461e-05              0.9000000
plotOncodrive(res = laml.sig, fdrCutOff = 0.1, useFraction = TRUE, labelSize = 0.5)

Adding and summarizing pfam domains

Maftools incorporates the pfamDomains function, which appends pfam domain data to amino acid alterations. Additionally, pfamDomain provides a summary of amino acid changes based on the impacted domains. This insight assists in determining which domain is predominantly affected in a cancer cohort.

laml.pfam = pfamDomains(maf = laml, AACol = 'Protein_Change', top = 10)
## Warning in pfamDomains(maf = laml, AACol = "Protein_Change", top = 10): Removed
## 50 mutations for which AA position was not available

#Protein summary (Printing first 7 columns for display convenience)
laml.pfam$proteinSummary[,1:7, with = FALSE]
##         HGNC AAPos Variant_Classification  N total  fraction   DomainLabel
##    1: DNMT3A   882      Missense_Mutation 27    54 0.5000000 AdoMet_MTases
##    2:   IDH1   132      Missense_Mutation 18    18 1.0000000      PTZ00435
##    3:   IDH2   140      Missense_Mutation 17    20 0.8500000      PTZ00435
##    4:   FLT3   835      Missense_Mutation 14    52 0.2692308      PKc_like
##    5:   FLT3   599           In_Frame_Ins 10    52 0.1923077      PKc_like
##   ---                                                                     
## 1512: ZNF646   875      Missense_Mutation  1     1 1.0000000          <NA>
## 1513: ZNF687   554      Missense_Mutation  1     2 0.5000000          <NA>
## 1514: ZNF687   363      Missense_Mutation  1     2 0.5000000          <NA>
## 1515: ZNF75D     5      Missense_Mutation  1     1 1.0000000          <NA>
## 1516: ZNF827   427        Frame_Shift_Del  1     1 1.0000000          <NA>
#Domain summary (Printing first 3 columns for display convenience)
laml.pfam$domainSummary[,1:3, with = FALSE]
##        DomainLabel nMuts nGenes
##   1:      PKc_like    55      5
##   2:      PTZ00435    38      2
##   3: AdoMet_MTases    33      1
##   4:         7tm_1    24     24
##   5:       COG5048    17     17
##  ---                           
## 499:    ribokinase     1      1
## 500:   rim_protein     1      1
## 501: sigpep_I_bact     1      1
## 502:           trp     1      1
## 503:        zf-BED     1      1

Survival analysis

The mafSurvival function carries out survival analysis and generates Kaplan-Meier curves by grouping samples according to the mutation status of user-specified genes or manually provided samples. This function necessitates input data containing a tumor sample barcode, a binary event, and time to the event.

#Survival analysis based on grouping of DNMT3A mutation status
mafSurvival(maf = laml, genes = 'DNMT3A', time = 'days_to_last_followup', Status = 'Overall_Survival_Status', isTCGA = TRUE)
## Looking for clinical data in annoatation slot of MAF..
## Number of mutated samples for given genes:
## DNMT3A 
##     48
## Removed 11 samples with NA's
## Median survival..
##     Group medianTime   N
## 1: Mutant        245  45
## 2:     WT        396 137

identifies genes with poor survival rate

#Using top 20 mutated genes to identify a set of genes (of size 2) to predict poor prognostic groups
prog_geneset = survGroup(maf = laml, top = 20, geneSetSize = 2, time = "days_to_last_followup", Status = "Overall_Survival_Status", verbose = FALSE)
## Removed 11 samples with NA's
print(prog_geneset)
##     Gene_combination P_value    hr  WT Mutant
##  1:      FLT3_DNMT3A 0.00104 2.510 164     18
##  2:      DNMT3A_SMC3 0.04880 2.220 176      6
##  3:      DNMT3A_NPM1 0.07190 1.720 166     16
##  4:      DNMT3A_TET2 0.19600 1.780 176      6
##  5:        FLT3_TET2 0.20700 1.860 177      5
##  6:        NPM1_IDH1 0.21900 0.495 176      6
##  7:      DNMT3A_IDH1 0.29300 1.500 173      9
##  8:       IDH2_RUNX1 0.31800 1.580 176      6
##  9:        FLT3_NPM1 0.53600 1.210 165     17
## 10:      DNMT3A_IDH2 0.68000 0.747 178      4
## 11:      DNMT3A_NRAS 0.99200 0.986 178      4
mafSurvGroup(maf = laml, geneSet = c("DNMT3A", "FLT3"), time = "days_to_last_followup", Status = "Overall_Survival_Status")
## Looking for clinical data in annoatation slot of MAF..
## Removed 11 samples with NA's
## Median survival..
##     Group medianTime   N
## 1: Mutant      242.5  18
## 2:     WT      379.5 164

Comparing two cohorts (MAFs)

A recent study by Madan et al (9) revealed that PML and RARA gene mutations occurred in patients experiencing a relapse of Acute Promyelocytic Leukemia (APL) but were not detected during the initial stage of the illness. This finding implies that mutation patterns may differ among various cancer cohorts. To pinpoint differentially mutated genes between two cohorts, the mafCompare function performs a Fisher test on all genes.

#Primary APL MAF
primary.apl = system.file("extdata", "APL_primary.maf.gz", package = "maftools")
primary.apl = read.maf(maf = primary.apl)
## -Reading
## -Validating
## --Non MAF specific values in Variant_Classification column:
##   ITD
## -Silent variants: 45 
## -Summarizing
## -Processing clinical data
## --Missing clinical data
## -Finished in 0.106s elapsed (0.092s cpu)
#Relapse APL MAF
relapse.apl = system.file("extdata", "APL_relapse.maf.gz", package = "maftools")
relapse.apl = read.maf(maf = relapse.apl)
## -Reading
## -Validating
## --Non MAF specific values in Variant_Classification column:
##   ITD
## -Silent variants: 19 
## -Summarizing
## -Processing clinical data
## --Missing clinical data
## -Finished in 0.082s elapsed (0.077s cpu)
#Considering only genes which are mutated in at-least in 5 samples in one of the cohort to avoid bias due to genes mutated in single sample.
pt.vs.rt <- mafCompare(m1 = primary.apl, m2 = relapse.apl, m1Name = 'Primary', m2Name = 'Relapse', minMut = 5)
print(pt.vs.rt)
## $results
##    Hugo_Symbol Primary Relapse         pval         or       ci.up      ci.low
## 1:         PML       1      11 1.529935e-05 0.03537381   0.2552937 0.000806034
## 2:        RARA       0       7 2.574810e-04 0.00000000   0.3006159 0.000000000
## 3:       RUNX1       1       5 1.310500e-02 0.08740567   0.8076265 0.001813280
## 4:        FLT3      26       4 1.812779e-02 3.56086275  14.7701728 1.149009169
## 5:      ARID1B       5       8 2.758396e-02 0.26480490   0.9698686 0.064804160
## 6:         WT1      20      14 2.229087e-01 0.60619329   1.4223101 0.263440988
## 7:        KRAS       6       1 4.334067e-01 2.88486293 135.5393108 0.337679367
## 8:        NRAS      15       4 4.353567e-01 1.85209500   8.0373994 0.553883512
## 9:      ARID1A       7       4 7.457274e-01 0.80869223   3.9297309 0.195710173
##         adjPval
## 1: 0.0001376942
## 2: 0.0011586643
## 3: 0.0393149868
## 4: 0.0407875250
## 5: 0.0496511201
## 6: 0.3343630535
## 7: 0.4897762916
## 8: 0.4897762916
## 9: 0.7457273717
## 
## $SampleSummary
##     Cohort SampleSize
## 1: Primary        124
## 2: Relapse         58

Forest Plots

forestPlot(mafCompareRes = pt.vs.rt, pVal = 0.1)

Co-onco Plots

genes = c("PML", "RARA", "RUNX1", "ARID1B", "FLT3")
coOncoplot(m1 = primary.apl, m2 = relapse.apl, m1Name = 'PrimaryAPL', m2Name = 'RelapseAPL', genes = genes, removeNonMutated = TRUE)

Co-bar Plots

coBarplot(m1 = primary.apl, m2 = relapse.apl, m1Name = "Primary", m2Name = "Relapse")

Lollipop Plots

lollipopPlot2(m1 = primary.apl, m2 = relapse.apl, gene = "PML", AACol1 = "amino_acid_change", AACol2 = "amino_acid_change", m1_name = "Primary", m2_name = "Relapse")
## Gene: PML
## 9 transcripts available. Use arguments refSeqID or proteinID to manually specify tx name.
##    HGNC refseq.ID protein.ID aa.length
## 1:  PML NM_002675  NP_002666       633
## 2:  PML NM_033238  NP_150241       882
## 3:  PML NM_033239  NP_150242       829
## 4:  PML NM_033240  NP_150243       611
## 5:  PML NM_033244  NP_150247       560
## 6:  PML NM_033246  NP_150249       423
## 7:  PML NM_033247  NP_150250       435
## 8:  PML NM_033249  NP_150252       585
## 9:  PML NM_033250  NP_150253       781
## Using longer transcript NM_033238 for now.
## 9 transcripts available. Use arguments refSeqID or proteinID to manually specify tx name.
##    HGNC refseq.ID protein.ID aa.length
## 1:  PML NM_002675  NP_002666       633
## 2:  PML NM_033238  NP_150241       882
## 3:  PML NM_033239  NP_150242       829
## 4:  PML NM_033240  NP_150243       611
## 5:  PML NM_033244  NP_150247       560
## 6:  PML NM_033246  NP_150249       423
## 7:  PML NM_033247  NP_150250       435
## 8:  PML NM_033249  NP_150252       585
## 9:  PML NM_033250  NP_150253       781
## Using longer transcript NM_033238 for now.

Clinical enrichment analysis

The “clinicalEnrichment” function aims to perform enrichment analysis on clinical attributes associated with samples. By comparing mutations across various categories within a clinical characteristic, it identifies potential enrichments through both groupwise and pairwise comparisons. For example, the function can be utilized to discover mutations connected to FAB_classification.

fab.ce = clinicalEnrichment(maf = laml, clinicalFeature = 'FAB_classification')
## Sample size per factor in FAB_classification:
## 
## M0 M1 M2 M3 M4 M5 M6 M7 
## 19 44 44 21 39 19  3  3
#Results are returned as a list. Significant associations p-value < 0.05
fab.ce$groupwise_comparision[p_value < 0.05]
##    Hugo_Symbol Group1 Group2 n_mutated_group1 n_mutated_group2      p_value
## 1:        IDH1     M1   Rest         11 of 44         7 of 149 0.0002597371
## 2:        TP53     M7   Rest           3 of 3        12 of 190 0.0003857187
## 3:      DNMT3A     M5   Rest         10 of 19        38 of 174 0.0089427384
## 4:       CEBPA     M2   Rest          7 of 44         6 of 149 0.0117352110
## 5:       RUNX1     M0   Rest          5 of 19        11 of 174 0.0117436825
## 6:        NPM1     M5   Rest          7 of 19        26 of 174 0.0248582372
## 7:        NPM1     M3   Rest          0 of 21        33 of 172 0.0278630823
## 8:      DNMT3A     M3   Rest          1 of 21        47 of 172 0.0294005111
##          OR      OR_low    OR_high       fdr
## 1: 6.670592 2.173829026 21.9607250 0.0308575
## 2:      Inf 5.355415451        Inf 0.0308575
## 3: 3.941207 1.333635173 11.8455979 0.3757978
## 4: 4.463237 1.204699322 17.1341278 0.3757978
## 5: 5.216902 1.243812880 19.4051505 0.3757978
## 6: 3.293201 1.001404899 10.1210509 0.5880102
## 7: 0.000000 0.000000000  0.8651972 0.5880102
## 8: 0.133827 0.003146708  0.8848897 0.5880102
plotEnrichmentResults(enrich_res = fab.ce, pVal = 0.05, geneFontSize = 0.5, annoFontSize = 0.6)

Drug-Gene Interactions

The “drugInteractions” function analyzes the interplay between drugs and genes, in addition to exploring the potential for genes to be targeted by drugs, utilizing data sourced from the Drug Gene Interaction database.

dgi = drugInteractions(maf = laml, fontSize = 0.75)

The illustration presents diverse groups of genes that may be targeted by drugs, as well as the five most prominent genes linked to each group. Furthermore, it allows for the retrieval of drug-gene interaction specifics, like the collection of recognized drugs that engage with DNMT3A.

dnmt3a.dgi = drugInteractions(genes = "DNMT3A", drugs = TRUE)
## Number of claimed drugs for given genes:
##      Gene N
## 1: DNMT3A 7
#Printing selected columns.
dnmt3a.dgi[,.(Gene, interaction_types, drug_name, drug_claim_name)]
##      Gene interaction_types    drug_name drug_claim_name
## 1: DNMT3A                                            N/A
## 2: DNMT3A                   DAUNORUBICIN    Daunorubicin
## 3: DNMT3A                     DECITABINE      Decitabine
## 4: DNMT3A                     IDARUBICIN      IDARUBICIN
## 5: DNMT3A                     DECITABINE      DECITABINE
## 6: DNMT3A         inhibitor   DECITABINE   CHEMBL1201129
## 7: DNMT3A         inhibitor  AZACITIDINE      CHEMBL1489

Oncogenic Signaling Pathways

The OncogenicPathways function evaluates whether TCGA cohorts11 exhibit a significant rise in the prevalence of well-known Oncogenic Signaling Pathways.

OncogenicPathways(maf = laml)

##    Pathway  N n_affected_genes fraction_affected Mutated_samples
## 1:    PI3K 29                1        0.03448276               1
## 2:    NRF2  3                1        0.33333333               1
## 3:    TP53  6                2        0.33333333              15
## 4:     WNT 68                3        0.04411765               4
## 5:     MYC 13                3        0.23076923               3
## 6:   NOTCH 71                6        0.08450704               8
## 7:   Hippo 38                7        0.18421053               7
## 8: RTK-RAS 85               18        0.21176471              97
##    Fraction_mutated_samples
## 1:              0.005181347
## 2:              0.005181347
## 3:              0.077720207
## 4:              0.020725389
## 5:              0.015544041
## 6:              0.041450777
## 7:              0.036269430
## 8:              0.502590674
PlotOncogenicPathways(maf = laml, pathways = "RTK-RAS")

##bibliography

Mayakonda, Anand, De-Chen Lin, Yassen Assenov, Christoph Plass, and H. Phillip Koeffler. 2018. “Maftools: Efficient and Comprehensive Analysis of Somatic Variants in Cancer.” Genome Research 28 (11): 1747–56..

Network, Cancer Genome Atlas Research, Timothy J. Ley, Christopher Miller, Li Ding, Benjamin J. Raphael, Andrew J. Mungall, A. Gordon Robertson, et al. 2013. “Genomic and Epigenomic Landscapes of Adult de Novo Acute Myeloid Leukemia.” The New England Journal of Medicine 368 (22): 2059–74..

Mayakonda, Anand, De-Chen Lin, Yassen Assenov, Christoph Plass, and H. Phillip Koeffler. 2018–. “Maftools: Efficient and Comprehensive Analysis of Somatic Variants in Cancer.” Genome Research 28 (11, /content/28/11.cover.gif): 1747–56. https://genome.cshlp.org/content/28/11/1747.
Stratmann, Svea, Sara A. Yones, Markus Mayrhofer, Nina Norgren, Aron Skaftason, Jitong Sun, Karolina Smolinska, et al. 2021. “Genomic Characterization of Relapsed Acute Myeloid Leukemia Reveals Novel Putative Therapeutic Targets.” Blood Advances 5 (3): 900–912. https://doi.org/10.1182/bloodadvances.2020003709.